Final Project - Clash Royale Analysis¶

Group: 'The Goblin Gang'
Dev Bhardwaj, Neal Machado, Michael Xie
Respective UIDs: 117212624, 117143096, 117226089
CMSC 320 Section 0101

Project Motivation¶

Clash Royale is a free mobile game created by Supercell Games in 2016. The game is a battle-based mobile game where players build decks of troop cards and participate in 1v1 battles against each other utilizing their decks. Players gain trophies for ladder matches that they win, and lose trophies for ladder matches that they lose.

Apart from the main battle experience itself, one of the essential goals of Clash Royale players is to create a deck which is effective against many other deck archetypes. This can be done in many ways: through strategic card choice (normally accomplished via trial and error), by unlocking new cards (of different rarities), and by spending in-game currency to upgrade card levels and improve card stats. As of May 2022, there are 107 different cards, of Common, Rare, Epic, Legendary, and Champion rarities. Furthermore, each card has an elixir cost (between 1 and 9 elixir) and each card has a card level, which can be upgraded to a maximum of level 14.

As three avid Clash Royale players, in this project we wanted to analyze the match data of many battles to gain a variety of information about Clash Royale card interactions. We wanted to see which cards provide good value, which cards cards have high win percentages, which cards have high skill gaps, and more. Armed with this knowledge, Clash Royale players (like ourselves) can utilize the analytics to improve match performance and wins.


Part 1: Data Collection + Curation¶

In order to run the analysis we want to do, we need a list of Clash Royale battles. However, the Supercell API does not offer an easy way to get a large dataset of unrelated battles so we have to take a more complicated approach. To get this dataset of battles, we first use the Supercell API to gather a list of 10000 clans, with a minimum requirement of 40 members per clan. Clans are just groups of players who collaborate with each other. However, Supercell does not provide detailed documentation on how the clan request works, so we don't know whether or not the 10000 clans we received are truly random. From each clan, we then use the API to request a list of members from each clan. In order to keep our data accurate, we only include members who had logged on within 14 days. We now have a list of every active member from every clan we queried. We then make a request for the battlelog of each member, which contains the last 25 games they played within a certain amount of time. We are only interesed in PvP battles, so we only include those battles in our dataset. Now, we have a large dataset of PvP battles to run analysis on.

Imports¶

In [ ]:
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure
import statistics
import numpy as np
from IPython.display import display
from sklearn import datasets, linear_model
from sklearn.metrics import mean_squared_error, r2_score
import math
import statsmodels.api as sm
from sklearn.feature_selection import chi2
import clashroyale

Creating a Large Dataframe of Varied Battle Data¶

In [ ]:
# sanitize helper function used to modify a Clash Royale user's player tag from form '#STRING' to form '%23STRING'
# since %23 is the hashtag equivalent we use when requesting
def sanitize(player_tag):
    return "%23" + player_tag[1:]

One of the main tools we use was the Clash Royale Developer API. To make successful requests, a developer token is needed. More information (including the API docs) can be found here.

In [ ]:
import requests
import json

neal_player_tag = sanitize('#23YQYY99U2')

# Clash Royale API developer token (omitted for security)
my_token = 'eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzUxMiIsImtpZCI6IjI4YTMxOGY3LTAwMDAtYTFlYi03ZmExLTJjNzQzM2M2Y2NhNSJ9.eyJpc3MiOiJzdXBlcmNlbGwiLCJhdWQiOiJzdXBlcmNlbGw6Z2FtZWFwaSIsImp0aSI6ImUxNjM4MDQ1LTdjYjEtNGU0ZC1hZWIyLWZhYTdmYmU3ZjU1NCIsImlhdCI6MTY1MjU1ODUxNCwic3ViIjoiZGV2ZWxvcGVyL2Y3ZDVjYmQ5LThkY2ItODY2My02ZjNhLTQ1ZmVhZDY3YWU4MCIsInNjb3BlcyI6WyJyb3lhbGUiXSwibGltaXRzIjpbeyJ0aWVyIjoiZGV2ZWxvcGVyL3NpbHZlciIsInR5cGUiOiJ0aHJvdHRsaW5nIn0seyJjaWRycyI6WyIxMjkuMi4xODEuMjciXSwidHlwZSI6ImNsaWVudCJ9XX0.XZHvtVvofzT-FgEd3Vj_IITCLy-bDk9scuB57ZkPknZJ1b2mmCe3Y10V7qbvnD5suD3sO9LgeVdBsIocc0lJCw'

We also utilize an unofficial Python library called clashroyale. More information (including docs) can be found here.

In [ ]:
# define our unofficial client
client = clashroyale.official_api.Client(
    token=my_token,
    is_async=False,  #  
    error_debug=False,  #  
    session=None,  # http requests.Session aiohttp.ClientSession
    timeout=10,  #  API 
    url='https://api.clashroyale.com/v1',  # API 
    camel_case=False,  #  key
    constants=None,  #  
    user_agent="The Goblin Gang"  #  
)

The first thing we do is request 10000 clans, of which we require to have at least 30 members. We are given each clan's unique ID, which we sanitize so that we can use them in URL requests.

In [ ]:
# request 10000 clans
r1=requests.get(f"https://api.clashroyale.com/v1/clans?minMembers=30", headers={"Accept":"application/json", "authorization":f"Bearer {my_token}"}, params = {"limit":10000})
test = r1.json()

# stores the IDs of the clans we get from our request
clans = []

# sanitize clan IDs
for item in test['items']:
    clans.append(sanitize(item['tag']))

The next this we look to do is use our list of clan tags and request the clan members of each clan. We process the returned JSON to isolate each member's tag, name, King Tower level, trophies, and arena. We store this in a dictionary mapping player tag (String) to an array of the other attributes.

One of the things we do before writing a player to our dictionary of player is check the datetime of when they were last seen. We only want data which is recent and relevant, so we only look at players who have been active within the past 14 days.

In [ ]:
from datetime import date, datetime, timedelta
import time

# store the number of possible players
total_possible_players = 0

today = datetime.today()    # datetime for current time
players = {}                # dictionary for our players

for clan in clans:
    time.sleep(0.1)        # sleep so that we do not exceed our request rate limit

    # request clan member data
    request=requests.get(f"https://api.clashroyale.com/v1/clans/{clan}/members", headers={"Accept":"application/json", "authorization":f"Bearer {my_token}"}, params = {"limit":10})
    clan_json = request.json()

    # loop through each member in the clan, adding them to our dictionary of players if they have been active recently
    for member in clan_json['items']:
        last_seen = member['lastSeen']
        delta = client.get_datetime(last_seen, False) - today
        if (abs(delta) < timedelta(days=14)):
            players[sanitize(member['tag'])] = [member['name'], member['expLevel'], member['trophies'], member['arena']]

        total_possible_players += 1

print("total possible:", total_possible_players, "\nactual active players:", len(players))
total possible: 8320 
actual active players: 8209

We define our constants.

In [ ]:
# defining our cards and elixir

cards_dict = {'Knight': '3', 'Archers': '3', 'Goblins': '2', 'Giant': '5', 'P.E.K.K.A': '7', 'Minions': '3', 'Balloon': '5', 'Witch': '5', 'Barbarians': '5', 'Golem': '8', 'Skeletons': '1', 'Valkyrie': '4', 'Skeleton Army': '3', 'Bomber': '3', 'Musketeer': '4', 'Baby Dragon': '4', 'Prince': '5', 'Wizard': '5', 'Mini P.E.K.K.A': '4', 'Spear Goblins': '2', 'Giant Skeleton': '6', 'Hog Rider': '4', 'Minion Horde': '5', 'Ice Wizard': '3', 'Royal Giant': '6', 'Guards': '3', 'Princess': '3', 'Dark Prince': '4', 'Three Musketeers': '9', 'Lava Hound': '7', 'Ice Spirit': '1', 'Fire Spirit': '1', 'Miner': '3', 'Sparky': '6', 'Bowler': '5', 'Lumberjack': '4', 'Battle Ram': '4', 'Inferno Dragon': '4', 'Ice Golem': '2', 'Mega Minion': '3', 'Dart Goblin': '3', 'Goblin Gang': '3', 'Electro Wizard': '4', 'Elite Barbarians': '6', 'Hunter': '4', 'Executioner': '5', 'Bandit': '3', 'Royal Recruits': '8', 'Night Witch': '4', 'Bats': '2', 'Royal Ghost': '3', 'Ram Rider': '5', 'Zappies': '4', 'Rascals': '5', 'Cannon Cart': '5', 'Mega Knight': '7', 'Skeleton Barrel': '3', 'Flying Machine': '4', 'Wall Breakers': '2', 'Royal Hogs': '5', 'Goblin Giant': '6', 'Fisherman': '3', 'Magic Archer': '4', 'Electro Dragon': '5', 'Firecracker': '3', 'Mighty Miner': '4', 'Super Witch': '6', 'Elixir Golem': '3', 'Battle Healer': '4', 'Skeleton King': '4', 'Archer Queen': '5', 'Golden Knight': '4', 'Skeleton Dragons': '4', 'Mother Witch': '4', 'Electro Spirit': '1', 'Electro Giant': '7', 'Cannon': '3', 'Goblin Hut': '5', 'Mortar': '4', 'Inferno Tower': '5', 'Bomb Tower': '4', 'Barbarian Hut': '7', 'Tesla': '4', 'Elixir Collector': '6', 'X-Bow': '6', 'Tombstone': '3', 'Furnace': '4', 'Goblin Cage': '4', 'Goblin Drill': '4', 'Fireball': '4', 'Arrows': '3', 'Rage': '2', 'Rocket': '6', 'Goblin Barrel': '3', 'Freeze': '4', 'Mirror': '0', 'Lightning': '6', 'Zap': '2', 'Poison': '4', 'Graveyard': '5', 'The Log': '2', 'Tornado': '3', 'Clone': '3', 'Earthquake': '3', 'Barbarian Barrel': '2', 'Heal Spirit': '1', 'Giant Snowball': '2', 'Royal Delivery': '3'}
cards = list(cards_dict.keys())
rarities = 'Common, Common, Common, Rare, Epic, Common, Epic, Epic, Common, Epic, Common, Rare, Epic, Common, Rare, Epic, Epic, Rare, Rare, Common, Epic, Rare, Common, Legendary, Common, Epic, Legendary, Epic, Rare, Legendary, Common, 0, Legendary, Legendary, Epic, Legendary, Rare, Legendary, Rare, Rare, Rare, Common, Legendary, Common, Epic, Epic, Legendary, Common, Legendary, Common, Legendary, 0, Rare, Common, Epic, Legendary, Common, Rare, 0, Rare, Epic, 0, Legendary, Epic, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, Common, Rare, Common, Rare, Rare, Rare, Common, Rare, Epic, Rare, Rare, 0, 0, Rare, Common, Epic, Rare, Epic, Epic, Epic, Epic, Common, Epic, Legendary, Legendary, Epic, Epic, 0, Epic, 0, Common, 0'.split(", ")
rarity_dict = {'Knight': 'Common', 'Archers': 'Common', 'Goblins': 'Common', 'Giant': 'Rare', 'P.E.K.K.A': 'Epic', 'Minions': 'Common', 'Balloon': 'Epic', 'Witch': 'Epic', 'Barbarians': 'Common', 'Golem': 'Epic', 'Skeletons': 'Common', 'Valkyrie': 'Rare', 'Skeleton Army': 'Epic', 'Bomber': 'Common', 'Musketeer': 'Rare', 'Baby Dragon': 'Epic', 'Prince': 'Epic', 'Wizard': 'Rare', 'Mini P.E.K.K.A': 'Rare', 'Spear Goblins': 'Common', 'Giant Skeleton': 'Epic', 'Hog Rider': 'Rare', 'Minion Horde': 'Common', 'Ice Wizard': 'Legendary', 'Royal Giant': 'Common', 'Guards': 'Epic', 'Princess': 'Legendary', 'Dark Prince': 'Epic', 'Three Musketeers': 'Rare', 'Lava Hound': 'Legendary', 'Ice Spirit': 'Common', 'Fire Spirit': 'Common', 'Miner': 'Legendary', 'Sparky': 'Legendary', 'Bowler': 'Epic', 'Lumberjack': 'Legendary', 'Battle Ram': 'Rare', 'Inferno Dragon': 'Legendary', 'Ice Golem': 'Rare', 'Mega Minion': 'Rare', 'Dart Goblin': 'Rare', 'Goblin Gang': 'Common', 'Electro Wizard': 'Legendary', 'Elite Barbarians': 'Common', 'Hunter': 'Epic', 'Executioner': 'Epic', 'Bandit': 'Legendary', 'Royal Recruits': 'Common', 'Night Witch': 'Legendary', 'Bats': 'Common', 'Royal Ghost': 'Legendary', 'Ram Rider': 'Legendary', 'Zappies':
'Rare', 'Rascals': 'Common', 'Cannon Cart': 'Epic', 'Mega Knight': 'Legendary', 'Skeleton Barrel': 'Common', 'Flying Machine': 'Rare', 'Wall Breakers': 'Epic', 'Royal Hogs': 'Rare', 'Goblin Giant': 'Epic', 'Fisherman': 'Legendary', 'Magic Archer': 'Legendary', 'Electro Dragon': 'Epic', 'Firecracker': 'Common', 'Mighty Miner': 'Champion', 'Super Witch': 'Legendary', 'Elixir Golem': 'Rare', 'Battle Healer': 'Rare', 'Skeleton King': 'Champion', 'Archer Queen': 'Champion', 'Golden Knight': 'Champion',
'Skeleton Dragons': 'Common', 'Mother Witch': 'Legendary', 'Electro Spirit': 'Common', 'Electro Giant': 'Epic', 'Cannon': 'Common', 'Goblin Hut': 'Rare', 'Mortar': 'Common', 'Inferno Tower': 'Rare', 'Bomb Tower': 'Rare', 'Barbarian Hut': 'Rare', 'Tesla': 'Common', 'Elixir Collector': 'Rare', 'X-Bow': 'Epic', 'Tombstone': 'Rare', 'Furnace': 'Rare', 'Goblin Cage': 'Rare', 'Goblin Drill': 'Epic', 'Fireball': 'Rare', 'Arrows': 'Common', 'Rage': 'Epic', 'Rocket': 'Rare', 'Goblin Barrel': 'Epic', 'Freeze': 'Epic', 'Mirror': 'Epic', 'Lightning': 'Epic', 'Zap': 'Common', 'Poison': 'Epic', 'Graveyard': 'Legendary', 'The Log': 'Legendary', 'Tornado': 'Epic', 'Clone': 'Epic', 'Earthquake': 'Rare', 'Barbarian Barrel': 'Epic', 'Heal Spirit': 'Rare', 'Giant Snowball': 'Common', 'Royal Delivery': 'Common'}
trophies = [0, 500, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000, 5500, 6000, 6500, 7000, 7500, 8000, 8500, 9000]

We define many helper functions to allow us to parse through battle JSON data and create the rows of our Pandas dataframe.

process_battles() takes in a json list of battles, and outputs a list of rows to be added to our dataframe. Row form is as follows:

  • arena (String)
  • blue tropies (int)
  • blue deck (String array)
  • blue levels (int array)
  • blue average elixir (float)
  • blue rarity score (float)
  • blueVector (int Array)
  • red trophies (int)
  • red deck (String array)
  • red levels (int array)
  • red average elixir (float)
  • red rarity score (float)
  • redVector (int Array)
  • winner ("Blue" or "Red")

numeric_rarity() returns an integer between 1 and 5 depending on the Rarity of the card inputted as a String.

process_deck() takes in a deck JSON and outputs a String array of the deck, an integer array of the deck's cards' levels, the deck's average elixir cost, and the deck's average rarity. One thing to note is that the API does not respond with accurate card levels. As an example: Legendary cards vary between level 9 and 14, but the Clash Royale API returns an integer between 1 and 6 (14-9 = 6-1, but these are innacurate values). To counteract this we normalize by making all maximum levels 14 (adding a constant amount to level depending on card rarity).

vectorize_deck() takes in a String array for a deck, and outputs a vector of 108 variables (1 for if the card is in the deck, and 0 for if the card is not).

In [ ]:
# dataframe row: arena (int), blue tropies (int), blue deck (String array), blue levels (int array), blue average elixir (float), blue rarity score (float), blueVector (int Array), red trophies (int), 
# red deck (String array), red levels (int array), red average elixir (float), red rarity score (float), redVector (int Array), win (true for blue, false for red) 

# takes in a json list of battles (as returned from a 'Player Battles' request) and outputs a list of rows to be added to dataframe
def process_battles(battles):
    rows = []

    for battle in battles:
        row = []
        team_required_data = battle['team'][0].keys() and 'startingTrophies' in battle['team'][0].keys() and 'trophyChange' in battle['team'][0].keys()
        opp_required_data = battle['opponent'][0].keys() and 'startingTrophies' in battle['opponent'][0].keys() 
        is_pvp = battle['type'] == 'PvP' 
        if is_pvp and team_required_data and opp_required_data:     # eliminate challenge battles, friendly battles (not latter), and tutorial battles (in which case we have no starting trophies)
            row.append(battle['arena']['name'])                 # add arena to row
            row.append(battle['team'][0]['startingTrophies'])      # add blue_trophies to row
            blue_deck, blue_levels, blue_avg_elixir, blue_rarity_score = process_deck(battle['team'][0]['cards'])      # process blue's json deck
            row.append(blue_deck)                               # add blue deck (str array) to row
            row.append(blue_levels)                             # add blue deck levels to row
            row.append(blue_avg_elixir)                         # add blue avg. elixir to row
            row.append(blue_rarity_score)                       # add blue rarity scoring to row
            row.append(vectorize_deck(blue_deck))               # add vectorized blue deck to row
            row.append(battle['opponent'][0]['startingTrophies'])      # add blue_trophies to row
            red_deck, red_levels, red_avg_elixir, red_rarity_score = process_deck(battle['opponent'][0]['cards'])      # process red's json deck
            row.append(red_deck)                               # add red deck (str array) to row
            row.append(red_levels)                             # add red deck levels to row
            row.append(red_avg_elixir)                         # add red avg. elixir to row
            row.append(red_rarity_score)                       # add red rarity scoring to row
            row.append(vectorize_deck(red_deck))               # add vectorized red deck to row


            # see whether blue or red won
            if(battle['team'][0]['trophyChange'] > 0):
                row.append('Blue')
            else:
                row.append('Red')

            rows.append(row)

    return rows


def numeric_rarity(str_rarity):
    if(str_rarity == 'Common'):
        return 1
    elif(str_rarity == 'Rare'):
        return 2
    elif(str_rarity == 'Epic'):
        return 3
    elif(str_rarity == 'Legendary'):
        return 4
    else:
        return 5

# takes in a deck json and outputs a string array of the deck, int array of levels, deck's average elixir cost, and deck's rarity score
def process_deck(deck):
    deck_str = []
    deck_level = []
    total_elixir = 0
    total_rarity = 0
    mirror_in_deck = False

    # normalize card level depending on rarity
    for card in deck:
        deck_str.append(card['name'])
        if rarity_dict[card['name']] == 'Common':
            deck_level.append(card['level'])
        elif rarity_dict[card['name']] == 'Rare':
            deck_level.append(card['level'] + 2)
        elif rarity_dict[card['name']] == 'Epic':
            deck_level.append(card['level'] + 5)
        elif rarity_dict[card['name']] == 'Legendary':
            deck_level.append(card['level'] + 8)
        else:
            deck_level.append(card['level'] + 10)
        total_rarity += numeric_rarity(rarity_dict[card['name']])
        if card['name'] == "Mirror":
            mirror_in_deck = True
        else:
            total_elixir += int(cards_dict[card['name']])

    # calculate rarity score
    rarity_score = total_rarity / 8

    # calculate average elixir cost (accounting for if the Mirror card is in the deck)
    if(mirror_in_deck):
        total_elixir += ((total_elixir) / 7) + 1
    average_elixir = total_elixir / 8

    return deck_str, deck_level, average_elixir, rarity_score

# takes in a string array of deck and creates a vectorized version (0s and 1s) to represent whether certain cards are in the deck
# note that since cards[0] is Knight, x0 will be 0 if Knight is not in deck and 1 in the deck. Similarly since cards[1] is Archers, x1 will be 0 if
# Archers is in deck, and 0 otherwise. The rest of the cards is likewise.
def vectorize_deck(str_deck_arr):
    empty_vector = [0] * len(cards)
    for card in str_deck_arr:
        empty_vector[cards.index(card)] = 1

    return empty_vector

    

Since non-common cards start at levels higher than level 1, their display level and actual level are not the same. Therefore, we have to adjust all non-common cards by varying amounts to normalize the levels accross all cards

Because our dataset is so large, we decided to serialize it as a JSON and read it in.

In [ ]:
# read in list of battles

f = open('battles_10k.json')
battles = json.load(f)
Out[ ]:
In [ ]:
import pandas as pd

# create dataframe of battles
columns = ["arena", "blue_trophies", "blue_deck", "blue_levels", "blue_average_elixir", "blue_rarity_score", "blue_vector", "red_trophies", \
    "red_deck", "red_levels", "red_average_elixir", "red_rarity_score", "red_vector", "winner"]
df = pd.DataFrame(columns=columns)

df
Out[ ]:
arena blue_trophies blue_deck blue_levels blue_average_elixir blue_rarity_score blue_vector red_trophies red_deck red_levels red_average_elixir red_rarity_score red_vector winner
In [ ]:
# add data to dataframe
rows_to_add = process_battles(battles)

for i in range(len(rows_to_add)):
    df.loc[i] = rows_to_add[i]

df
Out[ ]:
arena blue_trophies blue_deck blue_levels blue_average_elixir blue_rarity_score blue_vector red_trophies red_deck red_levels red_average_elixir red_rarity_score red_vector winner
0 Arena 3 601 [Mini P.E.K.K.A, Cannon, Poison, Baby Dragon, ... [6, 4, 8, 8, 5, 4, 7, 3] 4.000 2.000 [0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, ... 619 [Valkyrie, Skeleton Army, Goblin Cage, Muskete... [6, 8, 6, 6, 7, 7, 8, 6] 4.000 2.250 [0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, ... Red
1 Arena 3 615 [Mini P.E.K.K.A, Cannon, Poison, Baby Dragon, ... [6, 4, 8, 8, 5, 4, 7, 3] 4.000 2.000 [0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, ... 618 [Mini P.E.K.K.A, Archers, Bomber, Zap, Firebal... [7, 6, 5, 6, 6, 6, 5, 8] 3.375 1.625 [0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, ... Red
2 Arena 3 630 [Mini P.E.K.K.A, Cannon, Poison, Baby Dragon, ... [6, 4, 8, 8, 5, 4, 7, 3] 4.000 2.000 [0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, ... 630 [Prince, Battle Ram, Skeleton Army, Baby Drago... [9, 3, 6, 6, 7, 4, 7, 7] 3.875 2.250 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, ... Red
3 Arena 3 600 [Mini P.E.K.K.A, Cannon, Poison, Baby Dragon, ... [6, 4, 8, 8, 5, 4, 7, 3] 4.000 2.000 [0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, ... 600 [Bomber, Musketeer, Minions, Arrows, Fireball,... [5, 5, 4, 4, 5, 5, 5, 5] 3.500 1.375 [1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, ... Blue
4 Arena 3 568 [Mini P.E.K.K.A, Musketeer, Poison, Baby Drago... [6, 6, 8, 8, 6, 7, 6, 8] 4.125 2.375 [0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, ... 600 [Musketeer, Minions, Arrows, Fireball, Knight,... [5, 5, 5, 6, 5, 5, 6, 5] 3.625 1.500 [1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, ... Blue
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
86457 Arena 5 1360 [Wall Breakers, Goblin Cage, Bats, Dark Prince... [7, 8, 6, 8, 8, 7, 9, 10] 3.375 2.250 [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, ... 1362 [Hog Rider, Baby Dragon, Goblin Barrel, Spear ... [8, 8, 9, 8, 8, 8, 8, 8] 3.500 2.125 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, ... Blue
86458 Arena 5 1330 [Wall Breakers, Goblin Cage, Bats, Dark Prince... [7, 8, 6, 8, 8, 7, 9, 10] 3.375 2.250 [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, ... 1330 [Tombstone, Skeleton Army, Baby Dragon, Wizard... [8, 8, 8, 8, 7, 8, 10, 7] 3.625 2.625 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, ... Blue
86459 Arena 5 1300 [Wall Breakers, Barbarians, Bats, Dark Prince,... [7, 8, 6, 8, 8, 6, 9, 10] 3.750 2.375 [0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, ... 1300 [Skeleton Army, Electro Spirit, Fire Spirit, B... [8, 6, 8, 8, 7, 9, 7, 9] 2.625 1.625 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, ... Blue
86460 Arena 5 1328 [Wall Breakers, Cannon, Bats, P.E.K.K.A, Infer... [7, 8, 6, 9, 8, 5, 9, 10] 3.750 2.250 [0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, ... 1328 [Miner, Battle Ram, Wall Breakers, Witch, Gobl... [9, 9, 6, 8, 8, 8, 6, 7] 2.625 2.250 [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, ... Red
86461 Arena 5 1360 [Hog Rider, P.E.K.K.A, Baby Dragon, Skeleton A... [7, 8, 6, 7, 8, 7, 8, 6] 4.125 2.250 [0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, ... 1369 [Skeleton Army, Bomber, Musketeer, Arrows, Fir... [9, 6, 7, 6, 6, 6, 6, 6] 3.500 1.625 [1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, ... Red

86462 rows × 14 columns

In [ ]:
#  we see that we have lots of issues with outliers (0 and 100% win percentages) so we want to look at how distributed our battle data is over trophy ranges
test_players_trophies = [0] * len(trophies)

for i in range(0, len(trophies)-1):
    
    df2 = df[(df['blue_trophies'] > trophies[i]) & (df['blue_trophies'] <= trophies[i+1])]
    test_players_trophies[i] = len(df2)

fig, ax = plt.subplots(figsize=(16, 10), dpi=100)
ax.set_xticks(range(len(trophies)))
ax.set_xticklabels(trophies)

ax.bar(range(len(trophies)), test_players_trophies)

ax.set_xlabel("Trophy Range")
ax.set_ylabel("Number of Battles")
ax.set_title("Distribution of Battle Dataset Across Trophies")
plt.show()

The above graph is a representation of the distribution of battles over trophy range. We can see that the distribution is very left-skewed and unimodal. The reason why the distribution is this shape is because in the game, it is generally very easy to advance through the early trophy ranges as long as the player is somewhat competent. However, around the 5000 trophy range, it gets harder and harder to advance to higher trophy ranges, thus, most players end up being stuck there (colloquially known as being "hardstuck"). This is why a large percentage of the battles take place around this trophy range.

Part III: Exploratory Data Analysis¶

Card Win Percentage¶

One of the first things we look to do is calculate the win percentage per each card. We can do this fairly simply by iterating through our dataframe, looking at Blue/Red decks and seeing who wins.

In [ ]:
# win percentage based on card
card_win = [0] * len(cards) #all 0's
card_appearance = [0]*len(cards) #all 0's
card_win_percentages = [0]*len(cards) #all 0's

# dataframe row: arena (int), blue tropies (int), blue deck (String array), blue levels (int array), blue average elixir (float), red trophies (int), red deck (String array), red levels (int array), red average elixir (float), win (true for blue, false for red), blueVector (int Array), redVector (int Array)
for i in range(0, len(cards)): #for each card
    for index, row in df.iterrows():
        if cards[i] in row['blue_deck']: #if card is in blue deck
            card_appearance[i]+=1 #up corresponding appearance by one
            if row['winner'] == 'Blue': #if blue won
                card_win[i]+=1 #up win count
        if cards[i] in row['red_deck']: #if card is in red deck
            card_appearance[i]+=1 #up corresponding appearance by one
            if row['winner'] == 'Red': #if red won, that is blue didn't win, then up by 1
                card_win[i]+=1

for i in range(len(card_win)):
    if (card_appearance[i] != 0):
        card_win_percentages[i] = card_win[i] / card_appearance[i]

for i in range(len(cards)):
    print("Card:", cards[i], ", Win Percentage:", card_win_percentages[i])
Card: Knight , Win Percentage: 0.4558655345693987
Card: Archers , Win Percentage: 0.41520903113757596
Card: Goblins , Win Percentage: 0.4074074074074074
Card: Giant , Win Percentage: 0.4285211975085393
Card: P.E.K.K.A , Win Percentage: 0.5128154767750172
Card: Minions , Win Percentage: 0.45059465259769294
Card: Balloon , Win Percentage: 0.5037970504072199
Card: Witch , Win Percentage: 0.5310503770134523
Card: Barbarians , Win Percentage: 0.49930011198208285
Card: Golem , Win Percentage: 0.508235294117647
Card: Skeletons , Win Percentage: 0.45177313883299797
Card: Valkyrie , Win Percentage: 0.518647697720667
Card: Skeleton Army , Win Percentage: 0.5287127164692546
Card: Bomber , Win Percentage: 0.49890476687180557
Card: Musketeer , Win Percentage: 0.47799320509873044
Card: Baby Dragon , Win Percentage: 0.5188163524005704
Card: Prince , Win Percentage: 0.5187758332600475
Card: Wizard , Win Percentage: 0.5162489774670931
Card: Mini P.E.K.K.A , Win Percentage: 0.5174584492474499
Card: Spear Goblins , Win Percentage: 0.46546003016591253
Card: Giant Skeleton , Win Percentage: 0.4986778009742519
Card: Hog Rider , Win Percentage: 0.49755131238650746
Card: Minion Horde , Win Percentage: 0.49513162639740355
Card: Ice Wizard , Win Percentage: 0.5012109987177661
Card: Royal Giant , Win Percentage: 0.4849943374858437
Card: Guards , Win Percentage: 0.48233046800382046
Card: Princess , Win Percentage: 0.5204474209176404
Card: Dark Prince , Win Percentage: 0.5208597948216903
Card: Three Musketeers , Win Percentage: 0.47393364928909953
Card: Lava Hound , Win Percentage: 0.5163487738419619
Card: Ice Spirit , Win Percentage: 0.4680110256782243
Card: Fire Spirit , Win Percentage: 0.4687719298245614
Card: Miner , Win Percentage: 0.5112853767656965
Card: Sparky , Win Percentage: 0.49357718266927936
Card: Bowler , Win Percentage: 0.503560528992879
Card: Lumberjack , Win Percentage: 0.5230081764594029
Card: Battle Ram , Win Percentage: 0.5066448579022694
Card: Inferno Dragon , Win Percentage: 0.5166603043027758
Card: Ice Golem , Win Percentage: 0.43154583582983824
Card: Mega Minion , Win Percentage: 0.5106448626966985
Card: Dart Goblin , Win Percentage: 0.4774717603756822
Card: Goblin Gang , Win Percentage: 0.5097059068566192
Card: Electro Wizard , Win Percentage: 0.5162529340635892
Card: Elite Barbarians , Win Percentage: 0.5173198339116091
Card: Hunter , Win Percentage: 0.5003496503496504
Card: Executioner , Win Percentage: 0.5089090340731478
Card: Bandit , Win Percentage: 0.5022723447231106
Card: Royal Recruits , Win Percentage: 0.43465346534653465
Card: Night Witch , Win Percentage: 0.4977168949771689
Card: Bats , Win Percentage: 0.5068786850098267
Card: Royal Ghost , Win Percentage: 0.49088575096277276
Card: Ram Rider , Win Percentage: 0.4981000151998784
Card: Zappies , Win Percentage: 0.4385342789598109
Card: Rascals , Win Percentage: 0.4512372634643377
Card: Cannon Cart , Win Percentage: 0.4691011235955056
Card: Mega Knight , Win Percentage: 0.5281848659003832
Card: Skeleton Barrel , Win Percentage: 0.4909261576971214
Card: Flying Machine , Win Percentage: 0.4797588285960379
Card: Wall Breakers , Win Percentage: 0.49331651954602773
Card: Royal Hogs , Win Percentage: 0.4433547514372675
Card: Goblin Giant , Win Percentage: 0.4254807692307692
Card: Fisherman , Win Percentage: 0.46404682274247494
Card: Magic Archer , Win Percentage: 0.4768920924316303
Card: Electro Dragon , Win Percentage: 0.4662576687116564
Card: Firecracker , Win Percentage: 0.4987535953978907
Card: Mighty Miner , Win Percentage: 0.40816326530612246
Card: Super Witch , Win Percentage: 0
Card: Elixir Golem , Win Percentage: 0.43746430611079384
Card: Battle Healer , Win Percentage: 0.44142614601018676
Card: Skeleton King , Win Percentage: 0.5073170731707317
Card: Archer Queen , Win Percentage: 0.5061728395061729
Card: Golden Knight , Win Percentage: 0.5218855218855218
Card: Skeleton Dragons , Win Percentage: 0.46627373935821875
Card: Mother Witch , Win Percentage: 0.48256735340729
Card: Electro Spirit , Win Percentage: 0.44363521215959467
Card: Electro Giant , Win Percentage: 0.4905982905982906
Card: Cannon , Win Percentage: 0.46899867374005305
Card: Goblin Hut , Win Percentage: 0.35884057971014494
Card: Mortar , Win Percentage: 0.4112
Card: Inferno Tower , Win Percentage: 0.4986610968294773
Card: Bomb Tower , Win Percentage: 0.43521341463414637
Card: Barbarian Hut , Win Percentage: 0.3128834355828221
Card: Tesla , Win Percentage: 0.4935681773204037
Card: Elixir Collector , Win Percentage: 0.47858942065491183
Card: X-Bow , Win Percentage: 0.4592061742006615
Card: Tombstone , Win Percentage: 0.4828067504123842
Card: Furnace , Win Percentage: 0.46898588775845096
Card: Goblin Cage , Win Percentage: 0.5113857420757524
Card: Goblin Drill , Win Percentage: 0.4246575342465753
Card: Fireball , Win Percentage: 0.4761000436789031
Card: Arrows , Win Percentage: 0.49307011020214997
Card: Rage , Win Percentage: 0.4928635147190009
Card: Rocket , Win Percentage: 0.46700662927078024
Card: Goblin Barrel , Win Percentage: 0.5229487069335853
Card: Freeze , Win Percentage: 0.4788770053475936
Card: Mirror , Win Percentage: 0.47169533892074694
Card: Lightning , Win Percentage: 0.49471890971039184
Card: Zap , Win Percentage: 0.5167201670458441
Card: Poison , Win Percentage: 0.4818812644564379
Card: Graveyard , Win Percentage: 0.4524714828897338
Card: The Log , Win Percentage: 0.5119685774536791
Card: Tornado , Win Percentage: 0.48519793459552496
Card: Clone , Win Percentage: 0.4462260461478295
Card: Earthquake , Win Percentage: 0.43245078071961984
Card: Barbarian Barrel , Win Percentage: 0.5164371772805508
Card: Heal Spirit , Win Percentage: 0.4305210918114144
Card: Giant Snowball , Win Percentage: 0.4345034246575342
Card: Royal Delivery , Win Percentage: 0.4817244611059044
In [ ]:
# graph each card's win percentage

fig, ax = plt.subplots(figsize=(25, 10), dpi=100)
ax.set_xticks(range(len(cards)))
ax.set_xticklabels(cards)

ax.bar(cards, card_win_percentages, width=.8)
ax.set_xlabel("Card")
ax.set_ylabel("Win Percentage")
ax.set_title("Win Percentages of All Cards")

plt.xticks(rotation=90)

plt.show()

As we can see, the vast majority of card win percentages are between 45-50%. A couple win percentages to note: the "Super Witch" is not an actual card (it is a special card for a challenge) so it never shows up in PvP battles. The Barbarian Hut has the lowest win percentage (around 30%) with the Goblin Hut having just a little bit higher win percentage.

Card Elixir Efficiency¶

Now that we have each card's win percentage, we look to see how effective it is for its elixir. To do so, we look to graph each card's win percentage against its elixir cost.

In [ ]:
import statistics

# helper function to find the mean
def average(list):
    return sum(list) / len(list)

# create a list of elixir costs of the cards
all_elixirs = list(cards_dict.values())
for i in range(len(all_elixirs)):
    all_elixirs[i] = int(all_elixirs[i])


plt.figure(figsize=(16, 10), dpi=100)

# plot each scatter point (elixir cost, win percentage)
for i in range(len(cards)):
    plt.scatter(all_elixirs[i], card_win_percentages[i])

x1 = np.asarray(list(all_elixirs)).reshape(-1, 1)
y1 = np.asarray(card_win_percentages)

# create a linear regression model
regr2 = linear_model.LinearRegression()
regr2.fit(x1, y1)
regression_line = regr2.predict(x1)
m2 = regr2.coef_[0]
b2 = regr2.intercept_

# plot linear regression model
regr_label = "Regression: y = " + str(round(m2, 3)) + "x + " + str(round(b2, 3))
plt.plot(x1, regression_line, color="orange", label=regr_label)

# show our graph
plt.xlabel("Card Elixir Cost")
plt.ylabel("Win Percentage")
plt.title("Winning Percentage vs. Card Elixir Cost")
plt.legend(loc="upper right")
plt.show()

Although we cannot see labels for individual points (it would be too hard to interpret) from this graph we can see that cards to the upper left are more efficient (in terms of gaining more wins per elixir) and cards in the bottom right are the least elixir efficient. Moreover, cards whose points are above the line of best fit are more elixir efficient than the average card, and cards whose points are below the line of best fit are less elixir efficient than the average card.

Our line of best fit has a slope of -0.003, meaning that as a card's elixir increases by 1, it's win percentage decreases by 0.003.

We have seen what win percentage against card elixir cost looks like. We now want to look at win percentage against normalized elixir cost. We calculate normalized elixir cost as follows:

standardized_exlixiri = (exlixiri - avg_elixir) / sd

where sd is the standard deviation of elixir over all cards.

In [ ]:
# calculate average and standard deviation of elixir
all_elixirs_avg = average(all_elixirs)
sd_elixir = statistics.pstdev(all_elixirs)

# calculate standardized elixirs
standardized_elixirs = []
for i in range(len(card_win_percentages)):
    standardized_elixirs.append((all_elixirs[i] - all_elixirs_avg) / sd_elixir)


plt.figure(figsize=(16, 10), dpi=100)

# plot scatter points (standardized elixir, win percentage)
for i in range(len(cards)):
    plt.scatter(standardized_elixirs[i], card_win_percentages[i])


x1 = np.asarray(standardized_elixirs).reshape(-1, 1)
y1 = np.asarray(card_win_percentages)

# create a linear regression model
regr2 = linear_model.LinearRegression()
regr2.fit(x1, y1)
regression_line = regr2.predict(x1)
m2 = regr2.coef_[0]
b2 = regr2.intercept_

# plot our regression model
regr_label = "Regression: y = " + str(round(m2, 3)) + "x + " + str(round(b2, 3))
plt.plot(x1, regression_line, color="orange", label=regr_label)

# show our graph
plt.xlabel("Standardized Elixir Cost")
plt.ylabel("Win Percentage")
plt.title("Winning Percentage vs. Standardized Elixir Cost")
plt.legend(loc="upper right")
plt.show()

Again, points which represent cards to the upper left are more efficient (in terms of gaining more wins per elixir) and cards in the bottom right are the least elixir efficient. Likewise, cards whose points are above the line of best fit are more elixir efficient than the average card, and cards whose points are below the line of best fit are less elixir efficient than the average card.

Our line of best fit has a slope of -0.004, meaning that as a card's elixir increases by a standard deviation, it's win percentage decreases by 0.004.

Card Win Percentage Over Trophies¶

Next, what we want to see is how win percentages per card varies over trophy range. In order to do this, we first define trophy ranges of 500 trophies each from 0 to 9000. Then, we calculate the win percentage of each card for each trophy range. Since the amount of cards is so high, it is extremely hard to discern any useful information when all the data is plotted on one graph. Therefore, we decide to group the cards by their rarity and plot first the commons' win percentage over trophy range, then the rares, etc.

In [ ]:
# calculate card win percentages per trophy range

card_win_per_range = []
for j in range(0, len(trophies)-1):
    card_win = [0] * len(cards)         #all 0's
    card_appearance = [0]*len(cards)    #all 0's
# dataframe row: arena (int), blue tropies (int), blue deck (String array), blue levels (int array), blue average elixir (float), red trophies (int), red deck (String array), red levels (int array), red average elixir (float), win (true for blue, false for red), blueVector (int Array), redVector (int Array)
    for i in range(0, len(cards)):      #for each card
        df2 = df[(df['blue_trophies'] > trophies[j]) & (df['blue_trophies'] <= trophies[j+1])]
        for index, row in df2.iterrows():
            if cards[i] in row['blue_deck']: #if card is in blue deck
                card_appearance[i]+=1 #up corresponding appearance by one
                if row['winner'] == 'Blue': #if blue won
                    card_win[i]+=1 #up win count
            if cards[i] in row['red_deck']: #if card is in red deck
                card_appearance[i]+=1 #up corresponding appearance by one
                if row['winner'] == 'Red': #if red won, that is blue didn't win, then up by 1
                    card_win[i]+=1
    win_pct = [0] * len(cards) 
    for i in range(0, len(cards)):
        if card_appearance[i] != 0:
            win_pct[i]=card_win[i]/card_appearance[i]
    card_win_per_range.append(win_pct)
Out[ ]:

Finally, we look to plot the average win percentage of each rarity over trophy range. We do this first for the Knight.

In [ ]:
# graph these in batches on top of each other (like proj2) -- maybe do it by rarity or by exixir cost!!
# talk about missing data, how we assume that win percentage is never really zero (just that not enough/any data to even calculate win percetage)

knight_win_percentages = [np.nan] * len(trophies)

for i in range(len(card_win_per_range)):
    if card_win_per_range[i][0] == 0:
        knight_win_percentages[i] = np.nan
    else:
        knight_win_percentages[i] = card_win_per_range[i][0]


plt.figure(figsize=(16, 10), dpi=100)
plt.plot(trophies, knight_win_percentages)

# add labels to the plot and show the graph
plt.xlabel("Trophy Range")
plt.ylabel("Win Percentage")
plt.title("Win Percentage of Knight over Trophies")
plt.show()

Now that we know our methodology works, we do the same process for all cards in the Clash Royale game. Since there are too many cards for us to observe in just one graph, we split the graphs up by rarity.

In [ ]:
from matplotlib import interactive

### showing win percentage of card varieties over trophy ranges ###

# function for number theory division algorithm (given an n and divisor d, returns quotient q and remainder r)
def division(n, d):
    q = 0
    while (d <= n):
        q += 1
        n -= d
    return q, n


# divide our cards by rarity into common, rare, epic, legendary, and champion
common_cards, rare_cards, epic_cards, legendary_cards, champion_cards = [], [], [], [], []
for k,v in rarity_dict.items():
    if v == 'Common':
        common_cards.append(k)
    elif v == 'Rare':
        rare_cards.append(k)
    elif v == 'Epic':
        epic_cards.append(k)
    elif v == 'Legendary':
        legendary_cards.append(k)
    else:
        champion_cards.append(k)

cards_separated_by_rarity = [common_cards, rare_cards, epic_cards, legendary_cards, champion_cards]
avgs_rarity_data = []

# graph win percentage over trophies based for each set of card rarity
for rarity_set in cards_separated_by_rarity:


    # used to find averages
    avg_rarity_data = [np.nan] * len(trophies)
    nonzero_data = [0] * len(trophies)

    # set up the graph size
    plt.figure(figsize=(16, 10), dpi=100)


    # set up graph size -- split into n^2 subplots (where n^2 smallest square greater than the number of cards in the rarity type) so that we can have a square grid of graphs
    # n = math.ceil(math.sqrt(len(rarity_set)))
    # fig, ax = plt.subplots(nrows=n, ncols=n, figsize=(16, 10), dpi=100)

    for i in range(len(rarity_set)):
        k = cards.index(rarity_set[i])
        # q,r = division(i, n)

        data = [np.nan] * len(trophies)
        
        for j in range(len(card_win_per_range)):
            if card_win_per_range[j][k] == 0 or card_win_per_range[j][k] == 1:      # claim it is outlier if 0 or 1 (not enough data)
                data[j] = np.nan
            else:
                data[j] = card_win_per_range[j][k]
                nonzero_data[j] += 1
                if np.isnan(avg_rarity_data[j]):
                    avg_rarity_data[j] = data[j]
                else:
                    avg_rarity_data[j] += data[j]

        plt.plot(trophies, data, label=cards[k])
        # ax[q, r].plot(trophies, data)
        # ax[q, r].set_title(label=cards[k])


    # add labels to the plot and show the graph
    plt.xlabel("Trophy Range")
    plt.ylabel("Win Percentage")
    plt.title(f"Win Percentage of {rarity_dict[rarity_set[0]]} Cards Over Trophies")

    plt.legend(loc="upper left")
    plt.show()

    # calculate averages for this rarity
    for j in range(len(trophies)):
        if nonzero_data[j] != 0:
            avg_rarity_data[j] /= nonzero_data[j]

    avgs_rarity_data.append(avg_rarity_data)



# graph averages for card rarities
plt.figure(figsize=(16, 10), dpi=100)

for i in range(len(cards_separated_by_rarity)):
    plt.plot(trophies, avgs_rarity_data[i], label=rarity_dict[cards_separated_by_rarity[i][0]])

plt.xlabel("Trophy Range")
plt.ylabel("Win Percentage")
plt.title(f"Avereage Win Percentage of Varied Rarities of Cards Over Trophies")

plt.legend(loc="upper left")
plt.show()

From these graphs, we can garner the "skill expression" of certain cards, namely, whether cards are better utilized by better players. For example, in the case of the Royal Giant, it has around a 50% winrate for low to medium trophy ranges. However, near the higher trophy ranges, it skyrockets to around 80%. This means that more skilled players utilize the card much better than less skilled players. Conversely, if a card has a high winrate in lower trophy ranges but falls off later, than it can be considered a "noob trap", which means that it is only abusable against non-skilled players. Some cards such as the Barbarian Hut have low winrates throughout which just means that it sucks. Finally, we can see how the average winrate of commons, rares, etc. changes over time. One interesting trend is that commons and rares start with lower winrates but it increases over time. On the other hand, epics and legendaries start with higher winrates but they drop off. This makes sense since higher skilled players are better at piloting cards that might seem useless to non-skilled players. However, it is important to note that the vast majority of our data rests around the 5000 trophy range, which means that our analysis is most accurate around that range as well. This is reflected in our graphs, since that is where most of the cards are around 50% winrate. Near the extremes of our trophy ranges, we see much more extreme values for winrates, which is largely due to a smaller sample size.

Card Win Percentage Over Card Level¶

Next, what we want to see is how win percentages per card varies over level of the card. For each card for each level, we calculate the win percentage. Similar to with win percentage over trophy range, we separate the cards by rarity and plot each rarity over level. Then, we plot the average win percentage of each rarity over level rarity group.

In [ ]:
# calculate data for card win per card level

# Looking at how average win percentage varies over card level
card_levels = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
card_win_per_level = []

for j in range(0, len(card_levels)):
    card_win = [0] * len(cards)         #all 0's
    card_appearance = [0] * len(cards)    #all 0's
# dataframe row: arena (int), blue tropies (int), blue deck (String array), blue levels (int array), blue average elixir (float), red trophies (int), red deck (String array), red levels (int array), red average elixir (float), win (true for blue, false for red), blueVector (int Array), redVector (int Array)
    for i in range(0, len(cards)):      #for each card
        for index, row in df.iterrows():
            if cards[i] in row['blue_deck']: #if card is in blue deck
                if row['blue_levels'][row['blue_deck'].index(cards[i])] == card_levels[j]:
                    card_appearance[i]+=1 #up corresponding appearance by one
                    if row['winner'] == 'Blue': #if blue won
                        card_win[i]+=1 #up win count
            if cards[i] in row['red_deck']: #if card is in red deck
                if row['red_levels'][row['red_deck'].index(cards[i])] == card_levels[j]:
                    card_appearance[i]+=1 #up corresponding appearance by one
                    if row['winner'] == 'Red': #if red won
                        card_win[i]+=1 #up win count
    win_pct = [0] * len(cards) 
    for i in range(0, len(cards)):
        if card_appearance[i] != 0:
            win_pct[i]=card_win[i]/card_appearance[i]
    card_win_per_level.append(win_pct)

Out[ ]:

After calculating card win percentages per level, we look to graph them. Again, we separate cards by their respective rarities (to make the graphs understandable) and we graph.

In [ ]:
avgs_rarity_data_lvls = []

# graph win percentage over card level based for each set of card rarity
for rarity_set in cards_separated_by_rarity:

    # used to find averages
    avg_rarity_data_lvls = [np.nan] * len(card_levels)
    nonzero_data_lvls = [0] * len(card_levels)

    # set up the graph size
    plt.figure(figsize=(16, 10), dpi=100)


    for i in range(len(rarity_set)):
        k = cards.index(rarity_set[i])

        data_lvls = [np.nan] * len(card_levels)
        
        for j in range(len(card_win_per_level)):
            if card_win_per_level[j][k] == 0 or card_win_per_level[j][k] == 1:      # claim it is outlier if 0 or 1 (not enough data)
                data_lvls[j] = np.nan
            else:
                data_lvls[j] = card_win_per_level[j][k]
                nonzero_data_lvls[j] += 1
                if np.isnan(avg_rarity_data_lvls[j]):
                    avg_rarity_data_lvls[j] = data_lvls[j]
                else:
                    avg_rarity_data_lvls[j] += data_lvls[j]

        plt.plot(card_levels, data_lvls, label=cards[k])
        # ax[q, r].plot(trophies, data)
        # ax[q, r].set_title(label=cards[k])


    # add labels to the plot and show the graph
    plt.xlabel("Card Level")
    plt.ylabel("Win Percentage")
    plt.title(f"Win Percentage of {rarity_dict[rarity_set[0]]} Cards Over Card Level")

    plt.legend(loc="upper left")
    # calculate averages for this rarity
    for j in range(len(card_levels)):
        if nonzero_data_lvls[j] != 0:
            avg_rarity_data_lvls[j] /= nonzero_data_lvls[j]

    avgs_rarity_data_lvls.append(avg_rarity_data_lvls) 



# graph averages for card rarities
plt.figure(figsize=(16, 10), dpi=100)

for i in range(len(cards_separated_by_rarity)):
    plt.plot(card_levels, avgs_rarity_data_lvls[i], label=rarity_dict[cards_separated_by_rarity[i][0]])

# plot and label graph
plt.xlabel("Card Level")
plt.ylabel("Win Percentage")
plt.title(f"Avereage Win Percentage of Varied Rarities of Cards Over Levels")

plt.legend(loc="upper left")
plt.show()

These graphs showcase how the level of cards correspond to win percentage. This information is useful because we can see which cards you should level in order to maximize the impact on your win percentage. If a certain card's win percentage increases a lot after levelling it up, than that is the card that you should spend your resources on. An interesting trend looking at the average win percentage of rarity group over card level is that all rarity levels have above 50% win percentage near max level. The reasoning for this is that overlevelled cards usually win against underlevelled cards, so generally all maxed out cards will have an above average win condition.

Part IV: Creating A Model¶

Linear Regression Model¶

To be able to utilize a linear regression model, we create a vector consisting of the Blue team's deck, Red team's deck, Blue team's average elixir cost, Red team's average elixir cost, Blue's average rarity, and Red's average rarity. Our Y-values are the actual outcomes of the matches (Blue or Red win). We hope that our model will be able to help us predict a match outcome based off of inputted factors.

In [ ]:
# the independent variable is the cards, average elixir costs, and the rarity scores
x=[]
# the dependent variable is whether it leads to a win or loss
y=[]
for index, row in df.iterrows():
    temp=[]
    temp+=row['blue_vector']
    temp+=row['red_vector']
    temp.append(row['blue_average_elixir'])
    temp.append(row['red_average_elixir'])
    temp.append(row['blue_rarity_score'])
    temp.append(row['red_rarity_score'])
    x.append(temp)
    if row['winner']=="Blue":
        y.append(1)
    else:
        y.append(0)
In [ ]:
# split the data into testing and training sets
from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(
    x, y, test_size=0.25, shuffle=True
)
In [ ]:
# create a linear model with the independent and dependent variables above
from sklearn.linear_model import LinearRegression

model=LinearRegression()
model.fit(x_train, y_train)

# display the results of the linear model
result=model.score(x_test,y_test)
print('Coefficient of Determination:', result)
Coefficient of Determination: 0.031966554635873945
In [ ]:
# use statmodels module to get statistics about the linear model
x2 = sm.add_constant(x_train)
est = sm.OLS(y_train, x2)
est2 = est.fit()
print(est2.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.038
Model:                            OLS   Adj. R-squared:                  0.035
Method:                 Least Squares   F-statistic:                     11.98
Date:                Sun, 15 May 2022   Prob (F-statistic):               0.00
Time:                        17:10:34   Log-Likelihood:                -45408.
No. Observations:               64846   AIC:                         9.125e+04
Df Residuals:                   64631   BIC:                         9.320e+04
Df Model:                         214                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const       5.231e+10   2.42e+10      2.158      0.031     4.8e+09    9.98e+10
x1         -1.262e+10   6.27e+09     -2.011      0.044   -2.49e+10   -3.19e+08
x2         -1.262e+10   6.27e+09     -2.011      0.044   -2.49e+10   -3.19e+08
x3         -1.262e+10   6.27e+09     -2.011      0.044   -2.49e+10   -3.19e+08
x4         -4.229e+09   3.76e+09     -1.125      0.261   -1.16e+10    3.14e+09
x5          4.159e+09    4.4e+09      0.945      0.344   -4.46e+09    1.28e+10
x6         -1.262e+10   6.27e+09     -2.011      0.044   -2.49e+10   -3.19e+08
x7          4.159e+09    4.4e+09      0.945      0.344   -4.46e+09    1.28e+10
x8          4.159e+09    4.4e+09      0.945      0.344   -4.46e+09    1.28e+10
x9         -1.262e+10   6.27e+09     -2.011      0.044   -2.49e+10   -3.19e+08
x10         4.159e+09    4.4e+09      0.945      0.344   -4.46e+09    1.28e+10
x11        -1.262e+10   6.27e+09     -2.011      0.044   -2.49e+10   -3.19e+08
x12        -4.229e+09   3.76e+09     -1.125      0.261   -1.16e+10    3.14e+09
x13         4.159e+09    4.4e+09      0.945      0.344   -4.46e+09    1.28e+10
x14        -1.262e+10   6.27e+09     -2.011      0.044   -2.49e+10   -3.19e+08
x15        -4.229e+09   3.76e+09     -1.125      0.261   -1.16e+10    3.14e+09
x16         4.159e+09    4.4e+09      0.945      0.344   -4.46e+09    1.28e+10
x17         4.159e+09    4.4e+09      0.945      0.344   -4.46e+09    1.28e+10
x18        -4.229e+09   3.76e+09     -1.125      0.261   -1.16e+10    3.14e+09
x19        -4.229e+09   3.76e+09     -1.125      0.261   -1.16e+10    3.14e+09
x20        -1.262e+10   6.27e+09     -2.011      0.044   -2.49e+10   -3.19e+08
x21         4.159e+09    4.4e+09      0.945      0.344   -4.46e+09    1.28e+10
x22        -4.229e+09   3.76e+09     -1.125      0.261   -1.16e+10    3.14e+09
x23        -1.262e+10   6.27e+09     -2.011      0.044   -2.49e+10   -3.19e+08
x24         1.255e+10   7.42e+09      1.692      0.091   -1.99e+09    2.71e+10
x25        -1.262e+10   6.27e+09     -2.011      0.044   -2.49e+10   -3.19e+08
x26         4.159e+09    4.4e+09      0.945      0.344   -4.46e+09    1.28e+10
x27         1.255e+10   7.42e+09      1.692      0.091   -1.99e+09    2.71e+10
x28         4.159e+09    4.4e+09      0.945      0.344   -4.46e+09    1.28e+10
x29        -4.229e+09   3.76e+09     -1.125      0.261   -1.16e+10    3.14e+09
x30         1.255e+10   7.42e+09      1.692      0.091   -1.99e+09    2.71e+10
x31        -1.262e+10   6.27e+09     -2.011      0.044   -2.49e+10   -3.19e+08
x32        -1.262e+10   6.27e+09     -2.011      0.044   -2.49e+10   -3.19e+08
x33         1.255e+10   7.42e+09      1.692      0.091   -1.99e+09    2.71e+10
x34         1.255e+10   7.42e+09      1.692      0.091   -1.99e+09    2.71e+10
x35         4.159e+09    4.4e+09      0.945      0.344   -4.46e+09    1.28e+10
x36         1.255e+10   7.42e+09      1.692      0.091   -1.99e+09    2.71e+10
x37        -4.229e+09   3.76e+09     -1.125      0.261   -1.16e+10    3.14e+09
x38         1.255e+10   7.42e+09      1.692      0.091   -1.99e+09    2.71e+10
x39        -4.229e+09   3.76e+09     -1.125      0.261   -1.16e+10    3.14e+09
x40        -4.229e+09   3.76e+09     -1.125      0.261   -1.16e+10    3.14e+09
x41        -4.229e+09   3.76e+09     -1.125      0.261   -1.16e+10    3.14e+09
x42        -1.262e+10   6.27e+09     -2.011      0.044   -2.49e+10   -3.19e+08
x43         1.255e+10   7.42e+09      1.692      0.091   -1.99e+09    2.71e+10
x44        -1.262e+10   6.27e+09     -2.011      0.044   -2.49e+10   -3.19e+08
x45         4.159e+09    4.4e+09      0.945      0.344   -4.46e+09    1.28e+10
x46         4.159e+09    4.4e+09      0.945      0.344   -4.46e+09    1.28e+10
x47         1.255e+10   7.42e+09      1.692      0.091   -1.99e+09    2.71e+10
x48        -1.262e+10   6.27e+09     -2.011      0.044   -2.49e+10   -3.19e+08
x49         1.255e+10   7.42e+09      1.692      0.091   -1.99e+09    2.71e+10
x50        -1.262e+10   6.27e+09     -2.011      0.044   -2.49e+10   -3.19e+08
x51         1.255e+10   7.42e+09      1.692      0.091   -1.99e+09    2.71e+10
x52         1.255e+10   7.42e+09      1.692      0.091   -1.99e+09    2.71e+10
x53        -4.229e+09   3.76e+09     -1.125      0.261   -1.16e+10    3.14e+09
x54        -1.262e+10   6.27e+09     -2.011      0.044   -2.49e+10   -3.19e+08
x55         4.159e+09    4.4e+09      0.945      0.344   -4.46e+09    1.28e+10
x56         1.255e+10   7.42e+09      1.692      0.091   -1.99e+09    2.71e+10
x57        -1.262e+10   6.27e+09     -2.011      0.044   -2.49e+10   -3.19e+08
x58        -4.229e+09   3.76e+09     -1.125      0.261   -1.16e+10    3.14e+09
x59         4.159e+09    4.4e+09      0.945      0.344   -4.46e+09    1.28e+10
x60        -4.229e+09   3.76e+09     -1.125      0.261   -1.16e+10    3.14e+09
x61         4.159e+09    4.4e+09      0.945      0.344   -4.46e+09    1.28e+10
x62         1.255e+10   7.42e+09      1.692      0.091   -1.99e+09    2.71e+10
x63         1.255e+10   7.42e+09      1.692      0.091   -1.99e+09    2.71e+10
x64         4.159e+09    4.4e+09      0.945      0.344   -4.46e+09    1.28e+10
x65        -1.262e+10   6.27e+09     -2.011      0.044   -2.49e+10   -3.19e+08
x66         2.094e+10    1.1e+10      1.902      0.057   -6.35e+08    4.25e+10
x67        -1.517e+07    7.1e+06     -2.138      0.033   -2.91e+07   -1.26e+06
x68        -4.229e+09   3.76e+09     -1.125      0.261   -1.16e+10    3.14e+09
x69        -4.229e+09   3.76e+09     -1.125      0.261   -1.16e+10    3.14e+09
x70         2.094e+10    1.1e+10      1.902      0.057   -6.35e+08    4.25e+10
x71         2.094e+10    1.1e+10      1.902      0.057   -6.35e+08    4.25e+10
x72         2.094e+10    1.1e+10      1.902      0.057   -6.35e+08    4.25e+10
x73        -1.262e+10   6.27e+09     -2.011      0.044   -2.49e+10   -3.19e+08
x74         1.255e+10   7.42e+09      1.692      0.091   -1.99e+09    2.71e+10
x75        -1.262e+10   6.27e+09     -2.011      0.044   -2.49e+10   -3.19e+08
x76         4.159e+09    4.4e+09      0.945      0.344   -4.46e+09    1.28e+10
x77        -1.262e+10   6.27e+09     -2.011      0.044   -2.49e+10   -3.19e+08
x78        -4.229e+09   3.76e+09     -1.125      0.261   -1.16e+10    3.14e+09
x79        -1.262e+10   6.27e+09     -2.011      0.044   -2.49e+10   -3.19e+08
x80        -4.229e+09   3.76e+09     -1.125      0.261   -1.16e+10    3.14e+09
x81        -4.229e+09   3.76e+09     -1.125      0.261   -1.16e+10    3.14e+09
x82        -4.229e+09   3.76e+09     -1.125      0.261   -1.16e+10    3.14e+09
x83        -1.262e+10   6.27e+09     -2.011      0.044   -2.49e+10   -3.19e+08
x84        -4.229e+09   3.76e+09     -1.125      0.261   -1.16e+10    3.14e+09
x85         4.159e+09    4.4e+09      0.945      0.344   -4.46e+09    1.28e+10
x86        -4.229e+09   3.76e+09     -1.125      0.261   -1.16e+10    3.14e+09
x87        -4.229e+09   3.76e+09     -1.125      0.261   -1.16e+10    3.14e+09
x88        -4.229e+09   3.76e+09     -1.125      0.261   -1.16e+10    3.14e+09
x89         4.159e+09    4.4e+09      0.945      0.344   -4.46e+09    1.28e+10
x90        -4.229e+09   3.76e+09     -1.125      0.261   -1.16e+10    3.14e+09
x91        -1.262e+10   6.27e+09     -2.011      0.044   -2.49e+10   -3.19e+08
x92         4.159e+09    4.4e+09      0.945      0.344   -4.46e+09    1.28e+10
x93        -4.229e+09   3.76e+09     -1.125      0.261   -1.16e+10    3.14e+09
x94         4.159e+09    4.4e+09      0.945      0.344   -4.46e+09    1.28e+10
x95         4.159e+09    4.4e+09      0.945      0.344   -4.46e+09    1.28e+10
x96         4.159e+09    4.4e+09      0.945      0.344   -4.46e+09    1.28e+10
x97         4.159e+09    4.4e+09      0.945      0.344   -4.46e+09    1.28e+10
x98        -1.262e+10   6.27e+09     -2.011      0.044   -2.49e+10   -3.19e+08
x99         4.159e+09    4.4e+09      0.945      0.344   -4.46e+09    1.28e+10
x100        1.255e+10   7.42e+09      1.692      0.091   -1.99e+09    2.71e+10
x101        1.255e+10   7.42e+09      1.692      0.091   -1.99e+09    2.71e+10
x102        4.159e+09    4.4e+09      0.945      0.344   -4.46e+09    1.28e+10
x103        4.159e+09    4.4e+09      0.945      0.344   -4.46e+09    1.28e+10
x104       -4.229e+09   3.76e+09     -1.125      0.261   -1.16e+10    3.14e+09
x105        4.159e+09    4.4e+09      0.945      0.344   -4.46e+09    1.28e+10
x106       -4.229e+09   3.76e+09     -1.125      0.261   -1.16e+10    3.14e+09
x107       -1.262e+10   6.27e+09     -2.011      0.044   -2.49e+10   -3.19e+08
x108       -1.262e+10   6.27e+09     -2.011      0.044   -2.49e+10   -3.19e+08
x109            1e+10   4.65e+09      2.153      0.031    8.96e+08    1.91e+10
x110            1e+10   4.65e+09      2.153      0.031    8.96e+08    1.91e+10
x111            1e+10   4.65e+09      2.153      0.031    8.96e+08    1.91e+10
x112        5.534e+09   2.57e+09      2.153      0.031    4.95e+08    1.06e+10
x113        1.067e+09   4.96e+08      2.150      0.032    9.42e+07    2.04e+09
x114            1e+10   4.65e+09      2.153      0.031    8.96e+08    1.91e+10
x115        1.067e+09   4.96e+08      2.150      0.032    9.42e+07    2.04e+09
x116        1.067e+09   4.96e+08      2.150      0.032    9.42e+07    2.04e+09
x117            1e+10   4.65e+09      2.153      0.031    8.96e+08    1.91e+10
x118        1.067e+09   4.96e+08      2.150      0.032    9.42e+07    2.04e+09
x119            1e+10   4.65e+09      2.153      0.031    8.96e+08    1.91e+10
x120        5.534e+09   2.57e+09      2.153      0.031    4.95e+08    1.06e+10
x121        1.067e+09   4.96e+08      2.150      0.032    9.42e+07    2.04e+09
x122            1e+10   4.65e+09      2.153      0.031    8.96e+08    1.91e+10
x123        5.534e+09   2.57e+09      2.153      0.031    4.95e+08    1.06e+10
x124        1.067e+09   4.96e+08      2.150      0.032    9.42e+07    2.04e+09
x125        1.067e+09   4.96e+08      2.150      0.032    9.42e+07    2.04e+09
x126        5.534e+09   2.57e+09      2.153      0.031    4.95e+08    1.06e+10
x127        5.534e+09   2.57e+09      2.153      0.031    4.95e+08    1.06e+10
x128            1e+10   4.65e+09      2.153      0.031    8.96e+08    1.91e+10
x129        1.067e+09   4.96e+08      2.150      0.032    9.42e+07    2.04e+09
x130        5.534e+09   2.57e+09      2.153      0.031    4.95e+08    1.06e+10
x131            1e+10   4.65e+09      2.153      0.031    8.96e+08    1.91e+10
x132       -3.401e+09   1.58e+09     -2.154      0.031   -6.49e+09   -3.07e+08
x133            1e+10   4.65e+09      2.153      0.031    8.96e+08    1.91e+10
x134        1.067e+09   4.96e+08      2.150      0.032    9.42e+07    2.04e+09
x135       -3.401e+09   1.58e+09     -2.154      0.031   -6.49e+09   -3.07e+08
x136        1.067e+09   4.96e+08      2.150      0.032    9.42e+07    2.04e+09
x137        5.534e+09   2.57e+09      2.153      0.031    4.95e+08    1.06e+10
x138       -3.401e+09   1.58e+09     -2.154      0.031   -6.49e+09   -3.07e+08
x139            1e+10   4.65e+09      2.153      0.031    8.96e+08    1.91e+10
x140            1e+10   4.65e+09      2.153      0.031    8.96e+08    1.91e+10
x141       -3.401e+09   1.58e+09     -2.154      0.031   -6.49e+09   -3.07e+08
x142       -3.401e+09   1.58e+09     -2.154      0.031   -6.49e+09   -3.07e+08
x143        1.067e+09   4.96e+08      2.150      0.032    9.42e+07    2.04e+09
x144       -3.401e+09   1.58e+09     -2.154      0.031   -6.49e+09   -3.07e+08
x145        5.534e+09   2.57e+09      2.153      0.031    4.95e+08    1.06e+10
x146       -3.401e+09   1.58e+09     -2.154      0.031   -6.49e+09   -3.07e+08
x147        5.534e+09   2.57e+09      2.153      0.031    4.95e+08    1.06e+10
x148        5.534e+09   2.57e+09      2.153      0.031    4.95e+08    1.06e+10
x149        5.534e+09   2.57e+09      2.153      0.031    4.95e+08    1.06e+10
x150            1e+10   4.65e+09      2.153      0.031    8.96e+08    1.91e+10
x151       -3.401e+09   1.58e+09     -2.154      0.031   -6.49e+09   -3.07e+08
x152            1e+10   4.65e+09      2.153      0.031    8.96e+08    1.91e+10
x153        1.067e+09   4.96e+08      2.150      0.032    9.42e+07    2.04e+09
x154        1.067e+09   4.96e+08      2.150      0.032    9.42e+07    2.04e+09
x155       -3.401e+09   1.58e+09     -2.154      0.031   -6.49e+09   -3.07e+08
x156            1e+10   4.65e+09      2.153      0.031    8.96e+08    1.91e+10
x157       -3.401e+09   1.58e+09     -2.154      0.031   -6.49e+09   -3.07e+08
x158            1e+10   4.65e+09      2.153      0.031    8.96e+08    1.91e+10
x159       -3.401e+09   1.58e+09     -2.154      0.031   -6.49e+09   -3.07e+08
x160       -3.401e+09   1.58e+09     -2.154      0.031   -6.49e+09   -3.07e+08
x161        5.534e+09   2.57e+09      2.153      0.031    4.95e+08    1.06e+10
x162            1e+10   4.65e+09      2.153      0.031    8.96e+08    1.91e+10
x163        1.067e+09   4.96e+08      2.150      0.032    9.42e+07    2.04e+09
x164       -3.401e+09   1.58e+09     -2.154      0.031   -6.49e+09   -3.07e+08
x165            1e+10   4.65e+09      2.153      0.031    8.96e+08    1.91e+10
x166        5.534e+09   2.57e+09      2.153      0.031    4.95e+08    1.06e+10
x167        1.067e+09   4.96e+08      2.150      0.032    9.42e+07    2.04e+09
x168        5.534e+09   2.57e+09      2.153      0.031    4.95e+08    1.06e+10
x169        1.067e+09   4.96e+08      2.150      0.032    9.42e+07    2.04e+09
x170       -3.401e+09   1.58e+09     -2.154      0.031   -6.49e+09   -3.07e+08
x171       -3.401e+09   1.58e+09     -2.154      0.031   -6.49e+09   -3.07e+08
x172        1.067e+09   4.96e+08      2.150      0.032    9.42e+07    2.04e+09
x173            1e+10   4.65e+09      2.153      0.031    8.96e+08    1.91e+10
x174       -7.868e+09   3.65e+09     -2.154      0.031    -1.5e+10   -7.08e+08
x175        2.175e+07   9.99e+06      2.177      0.030    2.17e+06    4.13e+07
x176        5.534e+09   2.57e+09      2.153      0.031    4.95e+08    1.06e+10
x177        5.534e+09   2.57e+09      2.153      0.031    4.95e+08    1.06e+10
x178       -7.868e+09   3.65e+09     -2.154      0.031    -1.5e+10   -7.08e+08
x179       -7.868e+09   3.65e+09     -2.154      0.031    -1.5e+10   -7.08e+08
x180       -7.868e+09   3.65e+09     -2.154      0.031    -1.5e+10   -7.08e+08
x181            1e+10   4.65e+09      2.153      0.031    8.96e+08    1.91e+10
x182       -3.401e+09   1.58e+09     -2.154      0.031   -6.49e+09   -3.07e+08
x183            1e+10   4.65e+09      2.153      0.031    8.96e+08    1.91e+10
x184        1.067e+09   4.96e+08      2.150      0.032    9.42e+07    2.04e+09
x185            1e+10   4.65e+09      2.153      0.031    8.96e+08    1.91e+10
x186        5.534e+09   2.57e+09      2.153      0.031    4.95e+08    1.06e+10
x187            1e+10   4.65e+09      2.153      0.031    8.96e+08    1.91e+10
x188        5.534e+09   2.57e+09      2.153      0.031    4.95e+08    1.06e+10
x189        5.534e+09   2.57e+09      2.153      0.031    4.95e+08    1.06e+10
x190        5.534e+09   2.57e+09      2.153      0.031    4.95e+08    1.06e+10
x191            1e+10   4.65e+09      2.153      0.031    8.96e+08    1.91e+10
x192        5.534e+09   2.57e+09      2.153      0.031    4.95e+08    1.06e+10
x193        1.067e+09   4.96e+08      2.150      0.032    9.42e+07    2.04e+09
x194        5.534e+09   2.57e+09      2.153      0.031    4.95e+08    1.06e+10
x195        5.534e+09   2.57e+09      2.153      0.031    4.95e+08    1.06e+10
x196        5.534e+09   2.57e+09      2.153      0.031    4.95e+08    1.06e+10
x197        1.067e+09   4.96e+08      2.150      0.032    9.42e+07    2.04e+09
x198        5.534e+09   2.57e+09      2.153      0.031    4.95e+08    1.06e+10
x199            1e+10   4.65e+09      2.153      0.031    8.96e+08    1.91e+10
x200        1.067e+09   4.96e+08      2.150      0.032    9.42e+07    2.04e+09
x201        5.534e+09   2.57e+09      2.153      0.031    4.95e+08    1.06e+10
x202        1.067e+09   4.96e+08      2.150      0.032    9.42e+07    2.04e+09
x203        1.067e+09   4.96e+08      2.150      0.032    9.42e+07    2.04e+09
x204        1.067e+09   4.96e+08      2.150      0.032    9.42e+07    2.04e+09
x205        1.067e+09   4.96e+08      2.150      0.032    9.42e+07    2.04e+09
x206            1e+10   4.65e+09      2.153      0.031    8.96e+08    1.91e+10
x207        1.067e+09   4.96e+08      2.150      0.032    9.42e+07    2.04e+09
x208       -3.401e+09   1.58e+09     -2.154      0.031   -6.49e+09   -3.07e+08
x209       -3.401e+09   1.58e+09     -2.154      0.031   -6.49e+09   -3.07e+08
x210        1.067e+09   4.96e+08      2.150      0.032    9.42e+07    2.04e+09
x211        1.067e+09   4.96e+08      2.150      0.032    9.42e+07    2.04e+09
x212        5.534e+09   2.57e+09      2.153      0.031    4.95e+08    1.06e+10
x213        1.067e+09   4.96e+08      2.150      0.032    9.42e+07    2.04e+09
x214        5.534e+09   2.57e+09      2.153      0.031    4.95e+08    1.06e+10
x215            1e+10   4.65e+09      2.153      0.031    8.96e+08    1.91e+10
x216            1e+10   4.65e+09      2.153      0.031    8.96e+08    1.91e+10
x217           0.3481      0.156      2.238      0.025       0.043       0.653
x218          -0.4392      0.117     -3.741      0.000      -0.669      -0.209
x219       -6.711e+10   3.12e+10     -2.150      0.032   -1.28e+11   -5.93e+09
x220        3.574e+10   1.66e+10      2.153      0.031    3.21e+09    6.83e+10
==============================================================================
Omnibus:                   248789.363   Durbin-Watson:                   1.989
Prob(Omnibus):                  0.000   Jarque-Bera (JB):             9377.291
Skew:                          -0.189   Prob(JB):                         0.00
Kurtosis:                       1.176   Cond. No.                     1.02e+16
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 2.8e-26. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.

Sort through the slopes from the model and get the 5 largest and smallest slopes, which indicate what is most helpful and what is most detrimental to Blue's win percentage.

In [ ]:
# create two copies of the slopes generated by the model
import copy
slopes = list(model.coef_)
slopes2 = copy.deepcopy(slopes)
slopes3 = copy.deepcopy(slopes)
# sort one copy ascending and one copy descending
slopes2.sort()
slopes3.sort(reverse=True)
# display the 5 factors for each
print("Most detrimental to Blue win percentage")
for i in range(0,5):
    position = slopes.index(slopes2[i]) 
    if position < 108:
        print("\tBlue", cards[position], slopes2[i])
    elif position > 107 and position < 216:
        position = position - 108
        print("\tRed", cards[position], slopes2[i])
    else:
        if position == 216:
            print("\tBlue Average Elixir", slopes2[i])
        elif position == 217:
            print("\tRed Average Elixir", slopes2[i])
        elif position == 218:
            print("\tBlue Rarity Score", slopes2[i])
        else:
            print("\tRed Rarity Score", slopes2[i])
print("Most helpful to Blue win percentage")
for i in range(0,5):
    position = slopes.index(slopes3[i]) 
    if position < 108:
        print("\tBlue", cards[position], slopes3[i])
    elif position > 107 and position < 216:
        position = position - 108
        print("\tRed", cards[position], slopes3[i])
    else:
        if position == 216:
            print("\tBlue Average Elixir", slopes3[i])
        elif position == 217:
            print("\tRed Average Elixir", slopes3[i])
        elif position == 218:
            print("\tBlue Rarity Score", slopes3[i])
        else:
            print("\tRed Rarity Score", slopes3[i])
Most detrimental to Blue win percentage
	Red Rarity Score -1043246505358.5449
	Red Ice Spirit -248797013541.3383
	Red Skeletons -248797013541.33307
	Red Fire Spirit -248797013541.3137
	Red Zap -248797013541.31287
Most helpful to Blue win percentage
	Red Mighty Miner 272826239138.1185
	Red Archer Queen 272826239138.09604
	Red Skeleton King 272826239138.06415
	Red Golden Knight 272826239138.0145
	Red Mega Knight 142420425968.40897

The slope values of the linear model are extremely large, which indicates that they aren't very accurate. This inaccuracy is reflected in both the hypothesis testing and the coefficient of determination. The p-values are almost all greater than 0.05 which means the values are not significant and we can't accept the assumed hypothesis that is the model. The coefficient of determination is about 3%, which means the vasst majority of the time, the model does not accurately predict the result. Therefore, these 5 most detrimental and 5 most helpful factors must be taken with a huge grain of salt. As Clash Royale players, we can, from our collective experience, disagree that your opponent having Skeleton King generally helps you win.

In [ ]:
# create a decision tree classifier with the independent and dependent variables above
from sklearn.tree import DecisionTreeClassifier
from sklearn import metrics

dtree_clf = DecisionTreeClassifier()
dtree_clf.fit(x_train , y_train)

# display the results of the model
predicted_dt = dtree_clf.predict(x_test)
print(
    f"Classification report for classifier {dtree_clf}:\n"
    f"{metrics.classification_report(y_test, predicted_dt)}\n"
)
Classification report for classifier DecisionTreeClassifier():
              precision    recall  f1-score   support

           0       0.48      0.49      0.48      9628
           1       0.58      0.58      0.58     11988

    accuracy                           0.54     21616
   macro avg       0.53      0.53      0.53     21616
weighted avg       0.54      0.54      0.54     21616


The decision tree classifier was far more accurate than the linear model, with an accuracy of about 54%. This is quite sucessful, considering the countless factors that go into clash royale, because our model does not take any of the players' actions during the battle into account or match history/playing style.